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Abstract 

An artificial neural network is usually treated as a whole system, being char- 
acterized by its ground state (the global minimum of the energy functional), 
the set of fixed points, their basins of attraction, etc. However, it is quite nat- 
ural to suppose that a large network may consist of a set of almost autonome 
subnets. Each subnet works independently (or almost independently) and an- 
alyzes the same pattern from other points of view. It seems that it is a proper 
model for the natural neural networks. We discuss the problem of decom- 
position of a neural network into a set of weakly coupled subnets. The used 
technique is similar to the method for the extremal grouping of parameters, 
proposed by E.M.Braverman (1970). 

I. HOPFIELD'S MODEL OF A NEURAL NETWORK 

A neural network of size n is a set of n connected spin variables (spins) af, each crj can 
be either 1 or —1: 

a, = {±l}, i = l,2,...n. (1) 
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The interaction between spins is described by a connection matrix. Let Jw be the connection 
strength between the spins <7j and etyQ, and let <7j(i) be the value of ith spin at time t, then 

n 

hi{t) = J2 J ™' • ( 2 ) 

i'=l 

represents the local field that the spin <7j experiences at time t. Under the action of this 
field the new value of the spin crj at the next moment t + 1 is: 

f ^(t), if^(t)-a,(t)>0 
ai(t + l) = l (3) 

[-^(t), if Ai(t) • ai(t) < 

The vectors which coordinates are {±1} only is called the configuration vectors. We denote 
the configuration vectors by small Greek letters. 

It is convenient to describe the state of the network at time t by n-dimensional configu- 
ration vector 

a(t) = (<7i(t),<7 2 (t),...,<7 n (t)). 

If we introduce the connection matrix J = and define the quadratic form 

n 

E{t) = - E J a> ■ • ^'(*) = -(J^)> ( 4 ) 

i,i'=l 

then it is easy to show that for any symmetrical connection matrix J the overturn of a spin 
o~i(t), which value does not coincide with the sign of /ij(£), leads to the decrease of E(t): 

E(t + l) = E(t)+4-o-i(t)-hi(t). (5) 

E(t) can be interpreted as the energy of the state v(t). As the number of network states is 
finite and the ith spin does not turn over if hi(t) = 0, it is obvious that the final state of 
the network would be a state which corresponds to a minimum (may be local) of the energy 
E(t). In such a state every spin Oi will be align with its local field hi and there will be no 



1 For the sake of simplicity we suppose that there is no self-interaction in the system: Ja = Vi. 
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further evolution of the network. These states are called the fixed points of the network. 
Consequently, if the configuration vector a* = (<r*, cr|, . . . , a*) is a fixed points, then 

a* = sgn J iV ■ a*)j , i = 1, 2, . . . , n. (6) 

In what follows the configuration vectors which are fixed points will be marked by super- 
scripts "*". 

Let's define a neural network which is called Hopfield's network. Let p be a number of 
preassigned configuration vectors £ ^ , which are called the memorized patterns: 

& = 1 = 1,2, 



,P- 



(7) 



(The superscripts numerate the vectors from R n and the subscripts numerate their coordi- 
nates. Usually it is assumed that p < n or even p « n.) J.Hopfield [|IJ proposed to use the 
connection matrix of the form: 



J ii' 



0, i — i r , i, H — 1, 2, . . . ; n. 



(8) 



The matrix J (8) is a symmetric matrix with zero diagonal elements. Then, the fixed points 
are the minima of the energy E given by Eq.(4). If we define (p x n)-matrix S with p 
memorized patterns (7) as the rows, 

( c(D \ ( tW c(D \ 
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(9) 



then the expression for the connection matrix takes the form 

J = H T -H-p-I, (10) 

where (n x p) -matrix H T is the transpose of matrix S and I is the unit matrix in the 
space R n . Therefore the searching of the fixed points of Hopfield's network reduces to the 
maximization of the functional 



But this problem can be reformulated, if n p-dimensional vectors £j, which are the columns 
of matrix 3 are introduced: 



(2) 



(p) 



1,2, 



,n. 



(11) 



V 6 ' / 



In contrast to n-dimensional vectors £^ defined by Eq.(7), here the subscripts numerate the 
vectors ^ from R p and the superscripts numerate their coordinates. 

It is easy to see, that the problem of maximization of the functional || Ecr || 2 takes the 
form: 

n 

I &i ■ 6 II - *■ max, where Oi = {±1} VI (12) 



In other words, we have to find out such a weighted sum of the p-dimensional vectors & with 
the weights are equal {±1}, which length would be maximal. In what follows the expression 
(12) would be the start point of our consideration. 

II. FACTOR ANALYSIS AND EXTREMAL GROUPING OF PARAMETERS 



The problem (12) is a special case of the problem which is well-known for the centroid 
method of the factor analysis 0. The basic idea of the factor analysis is to replace the great 
number of the parameters, which describe the objects under investigation, by a considerably 
lesser set of specially constructed characteristics provided that such replacement would not 
lead to the loss of the essential information about these objects. 

The formalization of this idea can be done in the following way. Let us have p objects 
which are represented by the vectors = (xf , x\ 9, . . . , x$), I = 1, 2, . . . ,p in the space 
R n . Let's consider the (p x n)-matrix X, which rows are the object- vectors x^ l \ (This 
matrix is an analog of the matrix S (9), but now the matrix elements can be an arbitrary 
real numbers, and not ±1 only.) On the other hand the matrix X can be described as the 
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matrix which columns are the parameter- vectors xf. 

( M) \ 



X •; 



x 



x 



(2) 



(P) 
X\ j 



1,2,. .n. 



(We recall that the vectors from the space R n are numerated by superscripts: I — 1, . . . ,p, 
and the vectors from the space R p by subscripts: i — 1, . . . , n.) 

If a relatively small number t (t « n) of such jo-dimensional vectors f±, / 2 , . . . , f t can 
be found, that the papameter-vectors Xi can be represented in the form 



t 

Xi = y ' a,i s fs ~\~ i — 1,2,, . . . ,n, 

s=l 



where the remainders <Jj are small in some sense and can be omitted, then the objects can be 
described by the characteristics f s instead of the initially used parameters Xj. Indeed, due 
to the smallness of the remainders aj, characteristics f s adequately describe the investigated 
phenomenon. But it is much more convenient to work if the number of the parameters is 

— * 

considerable reduced. The characteristics f s are called the essential factors. 

The various models of the factor analysis differ in the forms in which the factors f s are 
sought and the sense in which the smallness of a« is understood. In the centroid method the 
first factor fi is sought as a linear combination Yh=i °~% ' %i of the parameters Xi with the 
weights Oi = {±1}, that have a maximal length 



h K J2 a * ' where II J2 a i 



i=i 



i=i 



Xi — max y o~i ' Xi 
-,={±i} U 



(13) 



The comparison of Eq.(12) and Eq.(13) shows that the problem of the network fixed points 
searching is equivalent to the construction of the first centroid factor for the set of the 
p-dimensional vectors & (11). 

— * — * 

In the centroid method after the construction of the first factor fi, the vectors bn ■ fi, 
where bn, % — 1,2, .,n are some coefficients, are subtracted from each parameter-vector 
In such a way we obtain a new set of vectors x[ = Xi — bn ■ f\ for which their own factor is 



constructed by analogy. This factor would be the second factor for the initial parameters x\ 
. This process will be repeated till the vectors which are obtained after the next step would 
be small enough. For details see 

An important generalization of the factor analysis was the idea of the extremal grouping 
of the parameters suggested by E.M.Braverman in 1970 ]3[]. Braverman introduced a model 
of the factor analysis where an essentially nonuniform distribution of the vectors Xi in the 
space R p was taken into account. 

Indeed, if the number n of the parameter-vectors is very large, it is possible that they 
can be divided into some compact groups such that the vectors joined into one group are 
"strongly correlated" with each other and are "weakly correlated" with the parameters 
included into other groups. Then it is reasonable to construct the factors not for the full 
set of the parameter-vectors, but for every compact group separately. If these groups are 
compact enough, we can restrict ourselves with the first factor of each group only. To 
divide the parameter-vectors into these compact groups, Braverman suggested an approach 
connected with the maximization of a certain functional depending both on the grouping of 
the parameters and on the choice of the factors. 

Let's write down Braverman's functional. Let p-dimensional vectors xi,Xz,...,x n be 
divided into current disjoint groups Ai, A 2 , . . . , A t : 

^iU^(J---UA = {l,2,...,n}. 
For every group A s the first centroid factor can be constructed as the solution of the problem: 

I a i ■ & 11= max II Gi ' %i II • (14) 

Then, the partition into t the most compact groups is obtained as a result of the maximiza- 
tion of the functional: 

M(Ai, A 2 ,..., A t ) =|| ]T a* -Xi\\ + || ]T a* ■ x t || + . . . + || ]T a* ■ x t ||-> max (15) 

ieAi ieA 2 ieAt 

where a* are the solutions of the problem (14) for every group A s , s = 1, 2, .., t. We want 
to notice, that, though the problem of maximization of the functional (15) is very hard, the 
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method for the extremal grouping of parameters was successfully used for various problems 
in engineering, economics, sociology, psychology and other fields 0|§. 

III. NEURAL NETWORKS DECOMPOSITION INTO SOME SUBNETS 

Let in Eqs.(14),(15) the vectors Xi be replaced by the vectors from Eq.(ll), i.e. only 
the vectors with the coordinates {±1} are under consideration. Then, in the framework of 
the neural network paradigm, the problem (14), (15) can be interpreted as the problem of 
the grouping of the network neurons into some connected groups. 

Indeed, natural networks have evident differential structure: different neuron groups 
have different functions, they respond for the regulation/ analysis of different aspects of a 
complicate pattern which is worked over by the network. To some extent every such neuron 
group can be treated as an autonomous neural network of the smaller size which is dealing 
with some specific features of the pattern. 

Let a network be consisted of some groups of neurons (subnets) A\, A 2 , . . . , A t . There 
is one universal mechanism for the functioning of all network neurons: a spin cij turns over 
if its sign does not coincide with the sign of the field hi acting on this spin. However, it is 
reasonable to assume that the incoming excitations from the neurons belonging to the same 
group as the neuron <Ji affect this neuron stronger then the excitations from the neurons 
of other groups (those, which analyze the same pattern from other points of view). This 
hierarchy of excitations can be modelled in different ways. As an initial model it can be 
assumed that: 

n 

Ht) = E Ju'-Vi'it) (16) 

i'&A s 

where the summation is taken over all neurons belonging to the same group A s as the zth 
neuron. 

The subnet consisting of the neurons from the group A s is evolving to one of its fixed 
points. This leads us to the problem (14). And the network as a whole is acting so, that 
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the composite functional 

Af(Wi)=ll E II + II E II +•••+ II E^-^IH max ( 17 ) 

would be maximized. 

We have discussed the situation when the neurons are already decomposed into groups 
Ax, A 2 , . . . ,A t . If the structure of the groups is unknown, but their number t is fixed, it is 
necessary to maximize the functional (17) with respect to the structure of the groups A s 
as well as with respect to all the weights cr^ inside every group. In this case Eqs.(14),(15), 
where the vectors & have to be substituted instead of the vectors X{, describe the optimal 
decomposition of the network into t autonomous subnets. 

Here some remarks must be done. Firstly, it is easy to see, that when the number of 
the groups t increases, the functional M(A 1} ..A t ) (15) is nondecreasing (it follows from the 
triangle inequality). This functional attains it's global maximum when the number of the 
groups t is equal n. However, it is a trivial decomposition. Simple geometric arguments 
show that when a group of strongly correlated vectors X{ is divided into two subgroups, the 
functional (15) increases negligibly. So, the problem is not to get the global maximum of 
the functional (15), but to obtain such a number t* of the groups beginning with which 
the further increase of the number of the groups would not lead to the substantial increase 
of this functional. About these t* groups we can speak as about the proper number of the 
subnets which constitute the initial n-network. 

Furthermore, it is reasonable to try to interpret the specific characteristics of each ob- 
tained subnet in meaning terms. In other words, we can try to understand what kind of 
pattern's characteristics are analyzed by each particular subnet, i.e. we must determine 
what kind of neurons are joined in the group. On this step the monograph [|J, which reflects 
the accumulated experience in this field, can be useful. 

Secondly, the above mentioned program can be fulfilled only if we are able to solve two 
problems: A) to find out the compact groups of the vectors B) to determine the optimal 
configuration {a*}, i G A s , for each group. What concerns the problem B, actually all the 
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attempts to create an effective algorithm for the maximization of the functional (3a, a) are 
devoted to this problem. 

The problem A is much less studied and seems to be more complicated. Usually, it is 
solved by step by step transferring of p-dimensional vectors from one group to another, and 
the comparison of the values of the functional (15) for the consequently obtained grouping. 
When n is rather large, in such a way only the determination of the local maximum of the 
functional M is guaranteed. We know not so much papers f||6|, devoted exactly to the 
problem of finding of the global maximum of a functional of type (15). In these papers 
the general case of vectors X{ with real coordinates is studied. As for neural networks, the 
vectors £j are specific: their coordinates are {±1}. It can be hope that the specific character 
of the vectors £j would make it possible to present effective method for the searching of the 
compact groups. 

And the last remark. Although the proposed approach was formulated for Hopfield's 
model, it can be generalized for the case of an arbitrary symmetric connection matrix: it is 
sufficient to replace in Eqs. (14), (15) and (17) the term 



a i ■ f * 



i€A s 



by 




and all reasoning are valid. 
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